INDEX
Explanations
expressions of personal opinions and experiences
New Auto-Interp
Negative Logits
nothing
-1.00
Nothing
-0.88
Nothing
-0.84
nothing
-0.83
only
-0.81
aucune
-0.80
aucun
-0.79
nessuna
-0.78
ingenting
-0.78
rien
-0.77
POSITIVE LOGITS
Wasn
0.72
aren
0.70
Infórmanos
0.70
enumii
0.65
isn
0.65
Wasn
0.65
isn
0.65
المعيارى
0.63
Aren
0.61
__*/
0.60
Activations Density 0.309%