INDEX
Explanations
mentions of specific locations, particularly city names
certain geographical or cultural names, particularly focusing on the prefix "Gw" and "Dj"
New Auto-Interp
Negative Logits
rawdownloadcloneembedreportprint
-0.94
å§«
-0.76
Candy
-0.71
contraceptives
-0.66
contraception
-0.65
Bunny
-0.65
ãĥ¼ãĥĨ
-0.65
tenance
-0.65
++++++++++++++++
-0.65
sight
-0.63
POSITIVE LOGITS
argo
1.00
uran
0.95
athed
0.94
izoph
0.94
ork
0.92
adr
0.90
ör
0.88
arm
0.87
orn
0.87
ouls
0.87
Activations Density 0.021%