INDEX
Explanations
names of politicians and public figures
names of specific individuals and entities from various contexts
New Auto-Interp
Negative Logits
rece
-0.57
thereof
-0.56
)).
-0.55
$.
-0.53
EStreamFrame
-0.53
thereto
-0.52
respectively
-0.52
disguise
-0.51
}.
-0.48
orsi
-0.48
POSITIVE LOGITS
udos
0.55
spokesman
0.53
argues
0.51
acknowledges
0.50
believes
0.49
surprisingly
0.47
rik
0.47
maintains
0.46
tweeted
0.46
wat
0.45
Activations Density 0.979%