INDEX
Explanations
high-frequency words and phrases indicating actions or states
New Auto-Interp
Negative Logits
endor
-0.17
OTE
-0.16
ote
-0.16
itele
-0.16
rada
-0.16
889
-0.15
ae
-0.15
ke
-0.14
ampo
-0.14
anja
-0.14
POSITIVE LOGITS
Maver
0.15
Çİ
0.15
αÏģά
0.14
redicate
0.14
ltra
0.14
çŃ
0.14
apy
0.14
áu
0.14
oine
0.14
reno
0.14
Activations Density 0.022%