INDEX
Explanations
phrases related to political figures and officials
proper nouns, especially names and titles related to individuals and their roles
New Auto-Interp
Negative Logits
resil
-0.78
unden
-0.67
cumbers
-0.65
elig
-0.61
conflic
-0.60
evasion
-0.59
theoret
-0.59
Palestin
-0.58
)=
-0.58
capacitor
-0.57
POSITIVE LOGITS
Jr
1.21
III
0.99
whom
0.97
Sr
0.95
udeau
0.94
ovich
0.89
enegger
0.88
who
0.86
vich
0.85
*.
0.84
Activations Density 0.296%