INDEX
Explanations
well succeeded, well used, well learned
New Auto-Interp
Negative Logits
not
-0.90
even
-0.77
pthread
-0.75
Basil
-0.75
ish
-0.73
ʲ
-0.72
otypic
-0.72
historia
-0.72
Sergio
-0.72
freund
-0.71
POSITIVE LOGITS
veillance
1.28
heure
0.98
墓
0.95
خاصی
0.91
буенча
0.91
Beschluss
0.90
peggio
0.88
LXX
0.87
puțin
0.85
ఫ
0.84
Activations Density 0.007%