INDEX
Explanations
abbreviations and specific terms
New Auto-Interp
Negative Logits
á
0.95
are
0.77
ker
0.73
be
0.70
ow
0.70
werk
0.70
for
0.69
ich
0.68
ana
0.68
werte
0.66
POSITIVE LOGITS
ل
0.94
К
0.94
motores
0.88
在
0.86
ограни
0.80
ન
0.80
empêcher
0.79
ル
0.78
cultivés
0.75
ش
0.75
Activations Density 0.839%