INDEX
Explanations
political and legal terms or phrases
punctuation marks and symbols used in textual context
New Auto-Interp
Negative Logits
orer
-0.73
oir
-0.71
ARCH
-0.68
OND
-0.64
isers
-0.61
NK
-0.61
Gw
-0.60
alty
-0.57
SPA
-0.57
ocial
-0.57
POSITIVE LOGITS
FontSize
0.67
76561
0.64
respectively
0.64
bet
0.63
Kard
0.62
otos
0.61
destro
0.61
disg
0.60
etheless
0.60
signifies
0.60
Activations Density 0.742%