INDEX
Explanations
expressions of emotional responses and contemplations
New Auto-Interp
Negative Logits
felves
-0.76
Personensuche
-0.73
utafitiHapana
-0.69
himſelf
-0.69
нодоро
-0.68
ectoria
-0.67
iconque
-0.66
tersebut
-0.66
μως
-0.66
újo
-0.66
POSITIVE LOGITS
ąg
0.54
You
0.53
oh
0.53
You
0.52
ostavi
0.51
a
0.51
du
0.48
رب
0.48
х
0.47
Go
0.47
Activations Density 0.300%