INDEX
Explanations
places or organizations mentioned in a social media post about various topics
New Auto-Interp
Negative Logits
recip
-0.82
tatt
-0.76
interns
-0.75
antip
-0.75
opp
-0.74
hob
-0.72
Opp
-0.71
Johnston
-0.69
HIP
-0.69
ank
-0.68
POSITIVE LOGITS
ve
1.30
vel
1.21
ure
1.19
ves
1.18
ver
1.17
vell
1.14
ving
1.13
vere
1.13
ura
1.07
ur
1.07
Activations Density 0.172%