INDEX
Explanations
personal pronouns followed by a statement
New Auto-Interp
Negative Logits
iquette
-0.62
toget
-0.61
ãĤ¼
-0.60
Hels
-0.60
Redditor
-0.59
eatures
-0.59
redients
-0.57
è£ıè
-0.56
Abyss
-0.55
pires
-0.54
POSITIVE LOGITS
think
1.42
'm
1.37
mean
1.25
've
1.18
guess
1.17
don
1.16
suppose
1.05
dunno
1.04
'd
1.02
wouldn
1.02
Activations Density 0.177%