INDEX
Explanations
words related to political entities or organizations
segments of text that are entirely blank or contain minimal content
New Auto-Interp
Negative Logits
terday
-0.92
wise
-0.77
sanity
-0.72
parity
-0.70
theless
-0.68
etheless
-0.67
unison
-0.66
Extras
-0.65
LAST
-0.63
coupled
-0.62
POSITIVE LOGITS
onian
1.28
leys
1.04
intern
0.96
opian
0.95
orian
0.95
mans
0.93
osphere
0.93
seys
0.93
isphere
0.91
venth
0.90
Activations Density 0.254%