INDEX
Explanations
phrases indicating frequency or repetition in actions
New Auto-Interp
Negative Logits
atis
-0.19
usi
-0.16
encoded
-0.15
suit
-0.14
aber
-0.14
etic
-0.14
olanlar
-0.14
íĥĪ
-0.14
Wire
-0.13
ivated
-0.13
POSITIVE LOGITS
omics
0.17
üstü
0.16
dik
0.15
aira
0.15
ilty
0.15
DBC
0.14
ideos
0.14
anno
0.14
azu
0.14
jak
0.13
Activations Density 0.018%