INDEX
Explanations
phrases indicating personal knowledge or familiarity with specific items or concepts
New Auto-Interp
Negative Logits
ikit
-0.15
ä¼´
-0.15
iki
-0.15
ãĥ«ãĤ¯
-0.15
rozen
-0.14
ipeg
-0.14
alama
-0.14
irq
-0.14
одеÑĢж
-0.14
جدا
-0.13
POSITIVE LOGITS
refer
0.42
referring
0.42
refers
0.37
mean
0.35
meant
0.34
refer
0.32
Refer
0.31
referred
0.30
Mean
0.30
REFER
0.29
Activations Density 0.192%