INDEX
Explanations
terms related to confirmation of existing knowledge or facts
New Auto-Interp
Negative Logits
_AA
-0.16
cele
-0.15
ابر
-0.15
ereum
-0.14
Cele
-0.14
anguage
-0.14
celebrity
-0.14
alien
-0.14
alone
-0.14
anggal
-0.13
POSITIVE LOGITS
unknown
0.33
unknown
0.26
Unknown
0.26
_unknown
0.24
Unknown
0.23
UNKNOWN
0.23
UNKNOWN
0.20
initial
0.18
unks
0.18
undefined
0.18
Activations Density 0.005%