INDEX
Explanations
phrases related to personal integrity and relationships
after "who," "what," "it," or "the."
foreign language words
New Auto-Interp
Negative Logits
rid
-0.73
ten
-0.73
hom
-0.73
un
-0.70
now
-0.69
fun
-0.67
do
-0.67
per
-0.67
minimal
-0.67
me
-0.66
POSITIVE LOGITS
selves
0.73
And
0.68
pregunto
0.65
tasche
0.65
alemania
0.65
tiroirs
0.65
cœurs
0.63
stratég
0.63
regeringen
0.62
anún
0.62
Activations Density 0.212%