INDEX
Explanations
collective nouns for animals
New Auto-Interp
Negative Logits
flock
0.57
herd
0.54
herd
0.53
swarm
0.52
horde
0.49
群
0.46
群
0.46
Herd
0.45
群体
0.43
hordes
0.42
POSITIVE LOGITS
murders
0.50
murder
0.46
bachelor
0.46
Murder
0.45
Murder
0.45
murder
0.44
cyn
0.43
noisy
0.42
congress
0.41
congress
0.40
Activations Density 0.006%