INDEX
Explanations
words related to speech and communication
New Auto-Interp
Negative Logits
richesse
-0.64
skydd
-0.59
ответственности
-0.58
ing
-0.57
culoare
-0.56
argint
-0.56
registrazione
-0.56
t
-0.56
Heere
-0.55
färg
-0.55
POSITIVE LOGITS
Spoken
1.14
Spoken
1.13
spoken
1.04
spoken
0.97
للاسماء
0.88
`<
0.85
Monta
0.81
)</
0.80
poken
0.78
cipital
0.77
Activations Density 0.001%