INDEX
Explanations
references to political entities or controversial historical events
references to political entities and historical contexts
New Auto-Interp
Negative Logits
silence
-0.63
bernatorial
-0.63
gements
-0.63
ezvous
-0.62
gment
-0.62
gments
-0.62
verage
-0.61
formats
-0.61
eanor
-0.60
uality
-0.60
POSITIVE LOGITS
onwards
0.73
Ended
0.62
Logged
0.61
ateurs
0.60
contractor
0.59
=================================================================
0.58
iets
0.57
anmar
0.57
edes
0.56
senal
0.55
Activations Density 0.661%