INDEX
Explanations
phrases that indicate perception or classification of concepts
New Auto-Interp
Negative Logits
Jefus
-0.81
itſelf
-0.81
uſed
-0.78
knecht
-0.76
出版年
-0.75
SPATH
-0.75
saveiro
-0.74
purpoſe
-0.73
Majefty
-0.71
Efq
-0.71
POSITIVE LOGITS
“
0.84
a
0.79
‘
0.71
writerow
0.68
"
0.68
being
0.66
是一種
0.63
be
0.62
«
0.61
part
0.61
Activations Density 0.373%