INDEX
Explanations
specific trigger words like "When"
occurrences of the word "When."
New Auto-Interp
Negative Logits
kaya
-0.88
whatever
-0.74
SPONSORED
-0.72
\\\\\\\\
-0.67
oof
-0.67
Í
-0.67
OTHER
-0.66
bart
-0.66
ding
-0.66
aking
-0.66
POSITIVE LOGITS
asked
1.28
confronted
1.17
soever
1.10
pressed
1.05
faced
1.03
contacted
0.96
discussing
0.95
questioned
0.95
comparing
0.91
ce
0.89
Activations Density 0.080%