INDEX
Explanations
phrases or sentences related to opinions or viewpoints
New Auto-Interp
Negative Logits
Brist
-0.81
anwhile
-0.79
Osw
-0.79
RAD
-0.76
Cosmetic
-0.75
airs
-0.74
horizont
-0.74
ifications
-0.73
Palest
-0.72
Downloadha
-0.72
POSITIVE LOGITS
¬
1.47
Ļ
1.22
¥
1.14
¡
1.14
ı
1.11
£
1.10
Ķ
1.08
Ń
1.04
Ĵ
1.03
ª
1.03
Activations Density 0.323%