INDEX
Explanations
references to physical spaces, particularly indoors and outdoors
New Auto-Interp
Negative Logits
thing
-0.17
eros
-0.17
ries
-0.17
shit
-0.17
mgr
-0.17
erus
-0.17
ataires
-0.16
ws
-0.15
eness
-0.15
maker
-0.15
POSITIVE LOGITS
/out
0.40
-out
0.30
-Out
0.24
halb
0.24
ÙĪØ®
0.23
OUT
0.22
Out
0.20
/up
0.20
out
0.19
joke
0.19
Activations Density 0.034%