INDEX
Explanations
numerical values associated with specific categories or ratings
New Auto-Interp
Negative Logits
dür
-0.16
ÑģÑĮ
-0.16
ource
-0.16
vell
-0.15
inal
-0.15
onds
-0.15
idebar
-0.15
ano
-0.15
oretical
-0.15
drawing
-0.14
POSITIVE LOGITS
年代
0.26
s
0.25
something
0.21
odd
0.17
ish
0.17
-Ñħ
0.17
th
0.16
Something
0.16
ahlen
0.16
625
0.16
Activations Density 0.260%