INDEX
Explanations
particular words indicating emphasis or strong assertion
the phrase "that" in various contexts
New Auto-Interp
Negative Logits
tails
-0.77
AMS
-0.70
HT
-0.68
oS
-0.67
hm
-0.67
hens
-0.67
hen
-0.66
umat
-0.66
guard
-0.66
orah
-0.64
POSITIVE LOGITS
they
0.84
justifies
0.76
unless
0.73
anybody
0.71
anyone
0.69
witches
0.69
contradicts
0.68
'[
0.68
there
0.68
milo
0.67
Activations Density 0.224%