INDEX
Explanations
references to programming frameworks and libraries
New Auto-Interp
Negative Logits
"
-0.60
-0.55
--
-0.50
in
-0.50
start
-0.48
#
-0.46
"
-0.46
a
-0.44
''
-0.44
#
-0.44
POSITIVE LOGITS
houſe
1.10
pleaſure
0.92
Houſe
0.91
Diſ
0.90
RetentionPolicy
0.89
ſch
0.88
Jefus
0.88
ſta
0.85
purpoſe
0.83
itſelf
0.83
Activations Density 0.011%