INDEX
Explanations
words before or after specific tokens
New Auto-Interp
Negative Logits
invest
0.48
Продол
0.44
Parte
0.43
Arte
0.43
arine
0.42
awal
0.41
наи
0.41
াইয়
0.41
آرام
0.41
சிறிய
0.41
POSITIVE LOGITS
elementType
0.45
’।
0.45
Donnelly
0.44
্নে
0.44
grandson
0.42
تقديم
0.41
ⓡ
0.40
ਗੇ
0.40
네
0.39
숲
0.39
Activations Density 0.005%