INDEX
Explanations
words related to trickery or deception
mentions of the term "fool" and its variations
New Auto-Interp
Negative Logits
Parenthood
-0.81
lined
-0.63
accompan
-0.63
ials
-0.61
orney
-0.60
capacity
-0.60
Balt
-0.59
disadvantage
-0.58
battle
-0.58
lining
-0.58
POSITIVE LOGITS
ery
0.97
hard
0.93
pas
0.90
sonian
0.84
ulent
0.83
eries
0.82
ument
0.81
usional
0.79
ibility
0.78
hemer
0.78
Activations Density 0.055%