INDEX
Explanations
references to a specific person's name
New Auto-Interp
Negative Logits
ivity
-0.80
ICAN
-0.70
ivism
-0.69
ivities
-0.65
Yugoslav
-0.64
Lisbon
-0.63
ential
-0.63
hs
-0.62
REDACTED
-0.62
Catalan
-0.61
POSITIVE LOGITS
orthy
0.86
stown
0.86
sey
0.83
arty
0.81
cock
0.78
bare
0.74
afort
0.73
quist
0.72
hattan
0.71
ORPG
0.70
Activations Density 0.047%