INDEX
Explanations
prefixes or words with the letters "im" followed by a single consonant
expressions of personal feelings and states of being
New Auto-Interp
Negative Logits
ridge
-0.73
rower
-0.71
Christy
-0.70
yards
-0.67
Coy
-0.66
Slate
-0.66
Chao
-0.66
anche
-0.64
Chavez
-0.64
YC
-0.63
POSITIVE LOGITS
im
3.73
Im
1.81
IM
1.45
Im
1.38
hes
1.35
imm
1.15
imbalance
1.14
imitation
1.10
Imam
1.08
i
1.05
Activations Density 0.013%