INDEX
Explanations
expressions that emphasize key observations or opinions about various topics
New Auto-Interp
Negative Logits
strup
-0.20
або
-0.15
nish
-0.15
ä¹
-0.15
ook
-0.15
ίÏĦ
-0.14
905
-0.14
uch
-0.13
PFN
-0.13
âĢı
-0.13
POSITIVE LOGITS
thing
0.37
Thing
0.30
Thing
0.30
thing
0.27
THING
0.21
ãģĵãģ¨ãģ«
0.20
part
0.19
(thing
0.18
fact
0.17
totiž
0.16
Activations Density 0.108%