INDEX
Explanations
proper nouns related to politics and public figures
variations of the verb "to be"
New Auto-Interp
Negative Logits
Scroll
-0.77
aughtered
-0.77
jandro
-0.76
rones
-0.76
aughters
-0.74
inav
-0.73
ipeg
-0.72
urry
-0.72
letal
-0.72
Starts
-0.72
POSITIVE LOGITS
bluff
1.29
joking
1.29
exagger
1.27
kidding
1.27
delusional
1.22
guilty
1.19
trying
1.16
unaware
1.16
aware
1.14
sincere
1.13
Activations Density 0.285%