INDEX
Explanations
references to religious texts and their interpretations
New Auto-Interp
Negative Logits
unde
-0.17
lip
-0.15
sto
-0.15
ifar
-0.15
semicolon
-0.14
psz
-0.14
afen
-0.14
CID
-0.14
iddles
-0.14
se
-0.14
POSITIVE LOGITS
Nous
0.15
izard
0.15
Gardens
0.15
We
0.15
Warner
0.14
Our
0.14
ohl
0.14
ACE
0.14
Associates
0.14
Toro
0.14
Activations Density 0.026%