INDEX
Explanations
references to identity, roles, and classifications of behavior
New Auto-Interp
Negative Logits
scolas
-0.64
educativo
-0.60
PublicKey
-0.57
bedrijven
-0.56
sság
-0.54
disambiguazione
-0.54
Luglio
-0.54
educativas
-0.54
vábbi
-0.53
ówno
-0.53
POSITIVE LOGITS
genius
0.68
optim
0.67
extro
0.63
hero
0.62
person
0.61
***!
0.59
weir
0.59
idiot
0.56
gentleman
0.56
nerd
0.56
Activations Density 0.343%