INDEX
Explanations
phrases related to public statements or actions
occurrences of the word "public" and its derivatives
New Auto-Interp
Negative Logits
kson
-0.81
llan
-0.77
vell
-0.73
imoto
-0.68
rax
-0.68
nesota
-0.67
nian
-0.67
Film
-0.67
ansas
-0.66
webkit
-0.66
POSITIVE LOGITS
relations
1.16
izing
1.04
ised
0.99
shaming
0.98
ally
0.95
outcry
0.94
ising
0.93
humiliation
0.93
servants
0.90
domain
0.89
Activations Density 0.053%