INDEX
Explanations
phrases indicating a personal issue or problem
the repetitive use of the word "this."
New Auto-Interp
Negative Logits
²¾
-0.80
rior
-0.69
ankind
-0.69
isms
-0.68
elcome
-0.68
ãĤ·ãĥ£
-0.67
master
-0.65
idel
-0.65
lee
-0.65
ãĥĥ
-0.65
POSITIVE LOGITS
kind
0.98
scenario
0.94
enthusi
0.94
trope
0.93
happen
0.92
type
0.91
sort
0.88
tactic
0.84
stuff
0.84
sucker
0.83
Activations Density 0.231%