INDEX
Explanations
sentences with positive affirmations or praises
expressions of pride and recognition
New Auto-Interp
Negative Logits
Rothschild
-0.65
miah
-0.63
bats
-0.62
contrace
-0.62
ombie
-0.60
Stras
-0.59
ibrary
-0.58
monarchy
-0.58
Telesc
-0.58
mathemat
-0.58
POSITIVE LOGITS
Deliver
1.51
ices
0.73
ounters
0.72
Corpus
0.69
ãĥ¼ãĥĨ
0.67
Advertisement
0.66
ãĥ¼ãĥĨãĤ£
0.64
omet
0.64
to
0.64
ibaba
0.62
Activations Density 0.000%