INDEX
Explanations
expressions of personal sentiment or opinions
New Auto-Interp
Negative Logits
ocrates
-0.16
cept
-0.15
aby
-0.14
ments
-0.14
igin
-0.14
urs
-0.14
lse
-0.14
obi
-0.13
ãģ¸ãģ¨
-0.13
ogens
-0.13
POSITIVE LOGITS
reff
0.16
ãĥ£
0.15
utomation
0.15
rane
0.15
EIF
0.14
treff
0.14
erli
0.14
TRGL
0.14
EAR
0.14
ведÑĮ
0.13
Activations Density 0.182%