INDEX
Explanations
specific numerical values and their relationships within a structured context
Non-English or code-related text
response fausse
New Auto-Interp
Negative Logits
gunt
-0.82
Arund
-0.79
baj
-0.78
arent
-0.78
ofollow
-0.77
Khat
-0.76
ammen
-0.75
autorytatywna
-0.74
Mahat
-0.71
UCT
-0.71
POSITIVE LOGITS
ศึกษา
0.69
Brod
0.66
Wilk
0.66
Hamm
0.65
ณา
0.65
impostor
0.64
Robb
0.64
Holm
0.63
Gorb
0.63
Haye
0.62
Activations Density 1.806%