INDEX
Explanations
phrases involving prepositions followed by a name or a pronoun
references to specific groups and their attitudes or actions
New Auto-Interp
Negative Logits
obin
-0.75
avis
-0.74
resses
-0.67
%%%%
-0.66
guiActiveUn
-0.66
activate
-0.63
roup
-0.61
ASS
-0.60
bernatorial
-0.60
onew
-0.60
POSITIVE LOGITS
nen
0.65
there
0.64
gaard
0.63
it
0.62
however
0.58
enment
0.58
çīĪ
0.57
anes
0.57
Frankfurt
0.56
seeing
0.56
Activations Density 0.174%