INDEX
Explanations
keywords related to actions or behaviors of people
references to people and their actions or states
New Auto-Interp
Negative Logits
predecessor
-0.96
srfAttach
-0.95
ãĥ¯
-0.76
successor
-0.73
counterpart
-0.70
éŃĶ
-0.70
è¡
-0.69
inth
-0.67
ausp
-0.66
master
-0.66
POSITIVE LOGITS
clam
1.09
complaining
0.88
dying
0.88
grav
0.88
distrust
0.86
noticing
0.85
hating
0.85
starving
0.84
afraid
0.83
flock
0.83
Activations Density 0.273%