INDEX
Explanations
expressions of emotional states and personal qualities
New Auto-Interp
Negative Logits
houſe
-0.96
Efq
-0.92
Theſe
-0.88
Monfieur
-0.87
fevere
-0.86
ſelf
-0.85
ſta
-0.83
ſtate
-0.81
laſt
-0.81
ſche
-0.80
POSITIVE LOGITS
without
0.69
without
0.56
WITHOUT
0.56
vi
0.53
образом
0.52
Without
0.50
Without
0.48
tanpa
0.48
ohne
0.48
,
0.47
Activations Density 1.091%