INDEX
Explanations
phrases that emphasize a point or convey strong opinions
instances of the word "that" in various contexts
New Auto-Interp
Negative Logits
arson
-0.80
emis
-0.74
ATURES
-0.74
atur
-0.73
IVERS
-0.71
ciples
-0.71
asures
-0.70
ãĥ©ãĥ³
-0.68
urch
-0.66
avers
-0.66
POSITIVE LOGITS
guy
0.94
translates
0.92
shouldn
0.92
kind
0.91
ain
0.90
sounds
0.88
doesn
0.88
reminds
0.88
proves
0.88
happens
0.87
Activations Density 0.131%