INDEX
Explanations
positive comments or remarks
instances of a specific character or symbol repeated throughout the text
New Auto-Interp
Negative Logits
vulner
-0.96
disadvant
-0.89
mathemat
-0.82
princ
-0.82
accomp
-0.76
constitu
-0.76
fundament
-0.76
traged
-0.74
advis
-0.73
sacrific
-0.73
POSITIVE LOGITS
ï¸ı
1.12
ï¸
0.92
à¥
0.82
RW
0.81
âĶĢâĶĢ
0.79
ı
0.79
æľ
0.79
\":
0.78
lime
0.77
ãĥ´ãĤ¡
0.75
Activations Density 0.257%