INDEX
Explanations
references to personality types, particularly introversion and extroversion
New Auto-Interp
Negative Logits
ldr
-0.16
Lage
-0.15
rum
-0.15
adge
-0.15
adoo
-0.14
qa
-0.14
quette
-0.14
kuk
-0.14
ucken
-0.14
acas
-0.14
POSITIVE LOGITS
g
0.15
Mattis
0.15
ennon
0.15
anya
0.14
makeup
0.14
.imag
0.14
069
0.14
à¥īल
0.14
personality
0.14
Wed
0.14
Activations Density 0.016%