INDEX
Explanations
phrases indicating collective experiences and emotions
New Auto-Interp
Negative Logits
ạo
-0.16
rian
-0.16
anywhere
-0.15
never
-0.15
afone
-0.14
άÏģÏĩ
-0.14
aise
-0.14
createClass
-0.14
Restricted
-0.14
urge
-0.14
POSITIVE LOGITS
except
0.25
Except
0.24
Except
0.23
except
0.22
alike
0.20
Everyone
0.20
_except
0.19
ayed
0.19
Everyone
0.18
Everybody
0.18
Activations Density 0.138%