INDEX
Explanations
various forms of measurement and evaluation terminology
New Auto-Interp
Negative Logits
fik
-0.15
Ukr
-0.15
roat
-0.14
ilyn
-0.14
neck
-0.14
sonian
-0.13
Legends
-0.13
legends
-0.13
moon
-0.13
rote
-0.13
POSITIVE LOGITS
Fou
0.25
Nietzsche
0.22
De
0.21
Gilles
0.20
Cinema
0.20
Madness
0.19
determ
0.19
plateau
0.19
schizophren
0.19
Diagram
0.19
Activations Density 0.010%