INDEX
Explanations
numerical representations or references that quantify specific information
New Auto-Interp
Negative Logits
himſelf
-0.85
myſelf
-0.79
raiſ
-0.78
themſelves
-0.76
purpoſe
-0.72
ſmall
-0.72
defaultstate
-0.70
tranſ
-0.69
itſelf
-0.69
ſtate
-0.69
POSITIVE LOGITS
7
1.03
9
1.02
5
1.02
6
1.01
8
1.00
4
0.99
3
0.96
0
0.93
2
0.91
1
0.89
Activations Density 1.446%