INDEX
Explanations
expressions of agreement or opinion
New Auto-Interp
Negative Logits
abstrato
-0.41
laikā
-0.40
yalnızca
-0.39
zís
-0.37
entraîner
-0.36
さまざまな
-0.35
çeşitli
-0.35
ferons
-0.34
almofada
-0.34
alcançar
-0.34
POSITIVE LOGITS
referrerpolicy
1.01
httphttps
0.96
BTW
0.91
FTFY
0.90
btw
0.88
שוליים
0.87
IMHO
0.85
Thx
0.82
Witam
0.82
prolly
0.81
Activations Density 6.507%