INDEX
Explanations
repetitive phrases or expressions emphasizing a level of intensity or significance
New Auto-Interp
Negative Logits
so
-0.31
så
-0.21
ÑĤак
-0.19
å¦ĤæŃ¤
-0.18
udeau
-0.17
so
-0.17
roman
-0.16
pane
-0.16
ature
-0.16
ÑĸлÑĮÑĪ
-0.16
POSITIVE LOGITS
-called
0.46
apy
0.31
far
0.30
iled
0.28
oner
0.28
jour
0.28
forth
0.27
oth
0.27
ething
0.27
aks
0.26
Activations Density 0.154%