INDEX
Explanations
text related to political consultations, commitments, and affiliations
symbols or characters that indicate formatting or coding issues
New Auto-Interp
Negative Logits
accomp
-0.63
Imper
-0.62
mash
-0.62
encount
-0.59
crus
-0.58
wors
-0.57
denomin
-0.56
magn
-0.56
Trojan
-0.56
backdrop
-0.56
POSITIVE LOGITS
him
1.20
them
1.10
their
1.04
his
0.95
selves
0.94
gently
0.92
onto
0.92
DERR
0.89
your
0.85
ationally
0.85
Activations Density 0.312%