INDEX
Explanations
spoilers in a text
phrases that indicate the presence of spoilers in a text
New Auto-Interp
Negative Logits
kt
-0.73
ndra
-0.73
urat
-0.69
rique
-0.69
leanor
-0.68
trak
-0.67
ŃĶ
-0.67
riad
-0.67
ton
-0.66
¬¼
-0.66
POSITIVE LOGITS
spoilers
1.17
spoiler
1.16
oiler
1.01
Spoiler
0.95
OIL
0.94
spo
0.93
spoil
0.92
ervative
0.83
Collider
0.75
":""},{"0.75
Activations Density 0.031%