INDEX
Explanations
abbreviations and symbols used for expressing emphasis or directional relationships
references to a specific demographic group, particularly focusing on individuals
New Auto-Interp
Negative Logits
giveaways
-0.73
swept
-0.64
scatter
-0.63
wob
-0.62
waste
-0.62
sweep
-0.61
dispers
-0.61
OM
-0.60
romy
-0.60
mammoth
-0.60
POSITIVE LOGITS
¹
0.95
¬
0.90
¡
0.88
ername
0.84
who
0.83
Ī
0.83
ı
0.81
ij
0.80
ª
0.78
Ĵ
0.78
Activations Density 0.204%