INDEX
Explanations
emotional connections and interactions between individuals
New Auto-Interp
Negative Logits
íĨłíĨł
-0.15
antro
-0.14
toxicity
-0.13
-Owned
-0.13
éĩį大
-0.13
reon
-0.13
Owned
-0.12
ivas
-0.12
ãĨ
-0.12
andler
-0.12
POSITIVE LOGITS
hap
0.32
startled
0.31
unsus
0.30
eager
0.29
unwitting
0.28
puzzled
0.28
grateful
0.28
astonished
0.28
bem
0.28
frustrated
0.27
Activations Density 0.535%