INDEX
Explanations
mentions of people in the context of social or political events
phrases that indicate people's actions and feelings
New Auto-Interp
Negative Logits
srfAttach
-0.94
predecessor
-0.79
berus
-0.73
successor
-0.65
èĢħ
-0.64
Accessory
-0.63
Equip
-0.62
ãĥ¯
-0.62
Reboot
-0.62
pupil
-0.61
POSITIVE LOGITS
clam
0.97
misunderstand
0.78
misinterpret
0.73
misunderstanding
0.71
wanting
0.71
alike
0.69
understandably
0.68
murm
0.68
fooled
0.67
grav
0.67
Activations Density 0.250%