INDEX
Explanations
descriptions or statements involving people
pronouns and references to people in sentences
New Auto-Interp
Negative Logits
�
-0.68
''
-0.63
guiActiveUnfocused
-0.59
âĢº
-0.57
``
-0.56
Tau
-0.56
è¦ļéĨĴ
-0.56
prompting
-0.54
exclaim
-0.53
β
-0.52
POSITIVE LOGITS
%"
1.02
withstanding
0.99
"—
0.97
resa
0.92
itage
0.92
odore
0.90
pherd
0.90
"[
0.87
chwitz
0.86
ntil
0.85
Activations Density 0.334%