INDEX
Explanations
the word "literally" in the text
the word "literally" used to emphasize statements
New Auto-Interp
Negative Logits
icipated
-0.80
lain
-0.78
winner
-0.76
ies
-0.76
eway
-0.75
ramid
-0.73
lings
-0.71
alez
-0.69
resses
-0.69
iers
-0.69
POSITIVE LOGITS
terday
0.73
ãĤ§
0.71
Ń·
0.71
reinvent
0.68
ãĤ¡
0.68
UNCH
0.66
torch
0.65
speaking
0.65
ruciating
0.64
piss
0.64
Activations Density 0.015%