INDEX
Explanations
terms related to eating disorders
instances of the word "ore."
New Auto-Interp
Negative Logits
srf
-0.80
ulk
-0.73
cffff
-0.73
otaur
-0.70
¥ŀ
-0.68
ilts
-0.68
arb
-0.68
itars
-0.68
insula
-0.68
ued
-0.66
POSITIVE LOGITS
tto
1.29
gon
1.10
tsky
1.08
byss
1.05
lli
1.05
ttes
1.04
xia
0.96
nz
0.95
tta
0.95
cki
0.90
Activations Density 0.020%