INDEX
Explanations
specific phrases enclosed in quotation marks
quotations or dialogue marks in the text
New Auto-Interp
Negative Logits
challeng
-0.69
mitter
-0.65
jection
-0.64
mates
-0.63
describ
-0.63
puberty
-0.62
apon
-0.61
İĭ
-0.58
rank
-0.58
ments
-0.58
POSITIVE LOGITS
/"
0.94
/>
0.75
kered
0.74
Stud
0.74
Elsa
0.73
Minecraft
0.70
["
0.70
mund
0.70
>>\
0.69
é¾į
0.69
Activations Density 0.066%