INDEX
Explanations
geographical locations and events
New Auto-Interp
Negative Logits
nai
-0.75
pring
-0.64
itionally
-0.62
alright
-0.60
ãĥ´
-0.60
iet
-0.60
ain
-0.59
inates
-0.56
âĻ
-0.55
perk
-0.53
POSITIVE LOGITS
Goldstein
0.72
there
0.69
there
0.65
orsi
0.65
Journalists
0.61
Hoffman
0.60
tis
0.59
Fundamental
0.57
atever
0.57
attackers
0.56
Activations Density 0.243%