INDEX
Explanations
mentions of reading, books, and literature
New Auto-Interp
Negative Logits
Comprometido
-0.51
garmin
-0.40
cuota
-0.38
relaj
-0.37
cumplido
-0.37
انجليز
-0.35
tatuaje
-0.35
experimentado
-0.34
katholischen
-0.34
capitán
-0.33
POSITIVE LOGITS
《
1.10
《
0.95
The
0.81
『
0.81
『
0.80
:《
0.73
'
0.72
‘
0.68
"
0.64
The
0.63
Activations Density 0.975%