INDEX
Explanations
phrases related to suggesting playing or engagement in some activity
references to specific people's names or identifiers
New Auto-Interp
Negative Logits
prus
-0.77
braska
-0.76
flix
-0.69
?]
-0.68
cephal
-0.68
aughs
-0.68
\">
-0.66
pes
-0.65
oiler
-0.65
oen
-0.64
POSITIVE LOGITS
ilon
0.72
Soviets
0.68
olver
0.64
åİ
0.64
cosmic
0.63
outgoing
0.63
lan
0.63
Magikarp
0.62
eral
0.62
Lans
0.61
Activations Density 0.000%