INDEX
Explanations
terms related to various categories such as music, politics, crime, and different medical conditions
terms related to crime, media, pop culture, and significant figures or events
New Auto-Interp
Negative Logits
ingham
-0.75
ries
-0.75
hips
-0.72
nings
-0.71
ships
-0.70
icka
-0.70
rice
-0.70
liness
-0.68
older
-0.68
ls
-0.67
POSITIVE LOGITS
ħĭ
0.74
ãĥŁ
0.70
metic
0.63
æ©
0.59
ãĥĻ
0.58
ãģı
0.56
ãĤ«
0.53
Juda
0.52
ĵĺ
0.52
éĹĺ
0.52
Activations Density 0.511%