INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ocene
-0.86
heit
-0.83
ropolis
-0.81
chnology
-0.80
aceous
-0.79
elin
-0.74
phasis
-0.74
romy
-0.74
iens
-0.73
alde
-0.73
POSITIVE LOGITS
é¾
0.73
homophobic
0.70
%%%%
0.67
Pipe
0.63
darts
0.62
boo
0.59
homophobia
0.59
uterte
0.58
tir
0.58
detract
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.