INDEX
Explanations
phrases related to intentions and reasons
phrases indicating certainty, intent, and lack of evidence
New Auto-Interp
Negative Logits
ahime
-0.79
pherd
-0.65
ortment
-0.62
tsky
-0.62
iets
-0.61
igs
-0.61
visor
-0.60
types
-0.60
acerb
-0.60
ork
-0.60
POSITIVE LOGITS
whatsoever
1.84
nor
1.15
anymore
0.95
anywhere
0.85
hesitation
0.80
anybody
0.77
except
0.75
slightest
0.75
EVER
0.74
necessarily
0.74
Activations Density 0.169%