509次阅读 | 发布于4年以前

Hi，今天和大家一起学习Go语言的字典。Go语言的字典又称为map，一种使用广泛的数据结构。它是拥有key/value对元素的「无序集合」，而且在集合中key必须是唯一的。

声明和初始化

声明一个字典的语法：

var 名字 map[key的类型] value的类型

看几个实际的例子：

package main

import "fmt"

func main()  {
    var m1 map[string] int // 声明
    m2 := map[string]int{}    // 声明并初始化一个空的map
    m3 := make(map[string] int)    // 声明并初始化一个空的map
    m4 := make(map[string] string, 2)    // 声明并初始化一个空的map

    fmt.Printf("m1=%+v \n", m1)
    fmt.Printf("m2=%+v \n", m2)
    fmt.Printf("m3=%+v \n", m3)
    fmt.Printf("m4=%+v \n", m4)
}

声明一个字典主要包含 「名字」 「key的类型」 「value的类型」其中key的类型是除「函数」、「字典」、「切片」以外的其他类型，value可以是任意类型。

细心的同学已经发现了，我们的示例中「m4 := make(map[string] string, 2)」，除了声明key的类型和value的类型外还定义了一个「2」的长度。它作用是当长度小于一个bucket上面存放的key/value对时，go会直接从堆上分配存储空间。这样运行效率更高。至于什么是bucket，一个bucket上最多存放多少key/value对，我们后面会介绍。

「为什么key不能是函数、字典、切片类型呢？」 因为在根据字典的key寻找value时，需要判断传入的key值和存储的key值是否相等，所以key的类型必须支持判等操作。而函数、字典、切片三种类型的值是不能支持判等的。

一般建议使用基本数据类型作为key，因为基本数据类型的判等操作往往性能更优。这里的基本数据类型包括：布尔类型、整数类型、浮点数类型、复数类型、指针类型。

当选择字符串作为key值时，其性能优劣取决于字符串的长度，长度越长求hash值越慢。

当采用数组作为key时，计算hash值需要计算数组中每一个元素的hash值再做合并操作。采用结构体类型作为key时和数组类似，需要对其所有字段求hash然后合并。数组和结构体的hash操作比基本类型要低效。

interface类型作为key时，需要保证interface的具体值是可判等，否则会导致运行时panic。

有些情况下，我们需要使用切片（函数、字典同理）作为key时，但是字典又不允许key是切片，此时我们可以定义一个工具函数，接收切片，返回和切片一一对应的字符串，然后把字符串当作key。获取字段的值时，key就成了工具函数的返回值。

package main

import "fmt"

func main()  {
   m := map[string]int{}
   l := []string{"a","b"}

   m[tool(l)] = 100 //tool函数的返回值，作为key
   fmt.Printf("m=%+v",m)  // m=map[["a" "b"]:100]
}

// 工具函数 
func tool(l []string)  string{
   return fmt.Sprintf("%q",l)
}

常用操作

赋值、访问值、删除key、遍历

package main

import "fmt"

func main()  {
   m := map[string]int{}

   m["a"] = 100 // 赋值
   m["b"] = 200 // 赋值
   m["c"] = 300 // 赋值
   m["d"] = 400 // 赋值

   fmt.Printf("%d \n",m["a"]) //取值 100
   delete(m, "a") // 删除key

   // 遍历
   for k,v := range m{ 
      fmt.Printf("k=%s,v=%d \n",k,v)
   }
   fmt.Printf("m=%+v \n",m) // m=map[b:200]
}

我们遍历map时返回的key值是无序的，如果开发中有排序的需要，我们可以把map的key放到一个数组中排序好之后，遍历数组然后按照key取值。

原理简介

Go语言中的map采用hash table实现。

具体的Hash算法,Go语言根据当前CPU的架构判断使用AES hash，RSA、SHAX等不同的算法。既然是hash table那就面临hash冲突的问题，解决hash冲突通常有两种方法：「开放寻址法」 「链地址法」

开放寻址法：开放寻址法又分为线性探测法、平方探测法、随机探测法和双重哈希法等几种方法，其原理就是通过不停的探测，直至寻找到与当前hash表中不冲突的key值。

链地址法：就是把hash冲突的key值，通过链表的形式存储起来。

这两种方法有不同的适用场景，Go语言中采用链地址发来解决hash冲突。

Go语言采用hmap结构体存储map，其结构如下

// A header for a Go map.
type hmap struct {
   // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
   // Make sure this stays in sync with the compiler's definition.
   count     int // # live cells == size of map.  Must be first (used by len() builtin)
   flags     uint8
   B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
   noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
   hash0     uint32 // hash seed

   buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
   oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
   nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

   extra *mapextra // optional fields
}

map数据被存放在bucket数组中，每一个bucket包含最多8个key/value对。

hash值的低位用来定位bucket，高位用来区分bucket内的key。

当多于8个key/value对需要存储在一个bucket上的时候，我们通过链接到额外的buckets来存储数据（也就是我们上文中说到的链地址法解决hash冲突）。

当hashtable需要扩容时，会分配一个新的数组，长度是原来的两倍，然后把老的bucket上的数据，逐步迁移到新数组中。

This file contains the implementation of Go's map type. A map is just a hash table. The data is arranged into an array of buckets. Each bucket contains up to 8 key/elem pairs. The low-order bits of the hash are used to select a bucket. Each bucket contains a few high-order bits of each hash to distinguish the entries within a single bucket. If more than 8 keys hash to a bucket, we chain on extra buckets. When the hashtable grows, we allocate a new array of buckets twice as big. Buckets are incrementally copied from the old bucket array to the new bucket array. Map iterators walk through the array of buckets and return the keys in walk order (bucket #, then overflow chain order, then bucket index). To maintain iteration semantics, we never move keys within their bucket (if we did, keys might be returned 0 or 2 times). When growing the table, iterators remain iterating through the old table and must check the new table if the bucket they are iterating through has been moved ("evacuated") to the new table. Picking loadFactor: too large and we have lots of overflow buckets, too small and we waste a lot of space. I wrote a simple program to check some stats for different loads: (64-bit, 8 byte keys and elems)

%overflow = percentage of buckets which have an overflow bucket bytes/entry = overhead bytes used per key/elem pair hitprobe = # of entries to check when looking up a present key missprobe = # of entries to check when looking up an absent key Keep in mind this data is for maximally loaded tables, i.e. just before the table grows. Typical tables will be somewhat less loaded.

Go语言map的实现源码在runtime/map.go文件中，感兴趣的同学可以自行阅读。

总结

本文主要介绍了Go语言字典的声明、初始化和常用方法。还简要介绍了Go语言中Map的实现原理。还需要补充一点map本身并不是并发安全的数据结构。在并发使用的情况下需要加锁（sync.Mutex/syncRwMutex)，或者使用Go的标准库sync.Map。

Go字典（值得收藏）

声明和初始化

常用操作

原理简介

总结