INDEX
Explanations
quotes and statements expressing opinions or observations
New Auto-Interp
Negative Logits
“
-0.24
âĢŀ
-0.21
“[
-0.19
ãĢĮãģĤ
-0.18
``
-0.16
ãĢĮãģĬ
-0.15
(“
-0.15
“â̦
-0.14
“We
-0.14
tte
-0.14
POSITIVE LOGITS
gnore
0.27
bsite
0.26
apons
0.26
gether
0.22
ir
0.22
adays
0.20
crease
0.18
tempts
0.18
","","
0.17
bove
0.16
Activations Density 0.103%