INDEX
Explanations
names and references to various individuals, such as celebrities and sports figures
repeated mentions of the substring "ra"
New Auto-Interp
Negative Logits
iaries
-0.78
ij士
-0.74
regor
-0.72
lace
-0.71
curfew
-0.69
GOODMAN
-0.69
é¾
-0.67
charism
-0.66
MacArthur
-0.65
OW
-0.63
POSITIVE LOGITS
irie
1.21
ven
1.19
fter
1.18
fters
1.16
eus
1.11
ving
1.07
xon
1.05
ppy
1.03
plets
1.03
ffe
1.00
Activations Density 0.025%