INDEX
Explanations
references to camaraderie and community spirit
New Auto-Interp
Negative Logits
edla
-0.16
si
-0.15
ing
-0.14
edio
-0.14
ccione
-0.14
esi
-0.14
tl
-0.14
heimer
-0.14
sid
-0.14
ifa
-0.14
POSITIVE LOGITS
ader
0.35
ADER
0.26
ade
0.21
adera
0.20
adar
0.20
spirit
0.20
ades
0.19
erie
0.19
aders
0.18
oufl
0.17
Activations Density 0.006%