INDEX
Explanations
references to significant anniversaries or milestones
New Auto-Interp
Negative Logits
辺
-0.15
Neutral
-0.15
ahn
-0.14
uckets
-0.14
evil
-0.14
izi
-0.13
Orta
-0.13
neutral
-0.13
Gre
-0.13
766
-0.13
POSITIVE LOGITS
ırak
0.18
oplevel
0.17
ateur
0.15
opsy
0.15
baise
0.14
Sham
0.14
-instance
0.14
intimid
0.14
Guy
0.14
Sys
0.13
Activations Density 0.060%