INDEX
Explanations
words and phrases related to personality traits, specifically introversion and extroversion
New Auto-Interp
Negative Logits
ide
-0.15
Pest
-0.14
aldo
-0.14
Platt
-0.14
leigh
-0.14
ringe
-0.14
eÄį
-0.14
otton
-0.14
eton
-0.14
Monroe
-0.14
POSITIVE LOGITS
unm
0.17
imar
0.16
version
0.15
verts
0.15
ãĥ¼ãĥŃ
0.15
estate
0.14
prung
0.14
ool
0.14
Gel
0.14
super
0.14
Activations Density 0.027%