INDEX
Explanations
personal traits or characteristics of individuals, especially related to appearance or behavior
elements related to complex character traits and emotional experiences
New Auto-Interp
Negative Logits
Sav
-0.64
yss
-0.60
idth
-0.60
stewards
-0.58
imar
-0.57
ariat
-0.54
Panama
-0.54
Americ
-0.53
olon
-0.53
odore
-0.53
POSITIVE LOGITS
*.
1.16
!.
1.00
.*
0.95
.(
0.91
;)
0.91
.[
0.89
ðŁĻĤ
0.89
+.
0.88
haha
0.88
thanks
0.88
Activations Density 0.879%