INDEX
Explanations
numbers preceded by apostrophes
negations or words indicating what is not true or should not happen
New Auto-Interp
Negative Logits
laun
-0.75
mortg
-0.70
Kirin
-0.68
Cinderella
-0.67
helicop
-0.67
princ
-0.67
nomine
-0.66
Duo
-0.58
Powered
-0.58
convol
-0.56
POSITIVE LOGITS
't
1.58
aturally
1.17
itely
1.16
aught
1.09
ately
1.07
ought
0.92
fortunately
0.91
omore
0.90
ighter
0.87
ÃŃ
0.86
Activations Density 0.016%