INDEX
Explanations
sentences discussing viewpoints or statements attributed to various groups or individuals
statements or claims attributed to various parties or individuals
New Auto-Interp
Negative Logits
ĸļ士
-0.80
cffffcc
-0.79
Written
-0.74
theless
-0.70
ptives
-0.68
written
-0.68
tele
-0.67
dinand
-0.65
ãĤ
-0.65
productive
-0.65
POSITIVE LOGITS
goodbye
1.07
olate
0.70
hello
0.67
they
0.67
IDA
0.66
it
0.66
olated
0.66
NAD
0.66
ansky
0.65
MSG
0.64
Activations Density 0.122%