INDEX
Explanations
phrases related to legal actions or criminal activities
New Auto-Interp
Negative Logits
Birth
-0.64
Ink
-0.62
grave
-0.60
alam
-0.60
repre
-0.60
olia
-0.58
Emer
-0.58
bourg
-0.58
Wond
-0.57
ortium
-0.57
POSITIVE LOGITS
swick
1.05
aways
0.99
gs
0.91
dy
0.88
escape
0.88
ners
0.85
ways
0.84
Disney
0.83
af
0.81
nin
0.80
Activations Density 4.026%