INDEX
Explanations
proper nouns or names of specific places and entities
negative or contradictory statements
New Auto-Interp
Negative Logits
ÂŃ
-0.70
"—
-0.53
],"
-0.49
umbn
-0.48
—"
-0.45
)—
-0.45
Olympic
-0.45
igenous
-0.44
farious
-0.42
blown
-0.42
POSITIVE LOGITS
tho
0.64
didnt
0.59
dont
0.58
pics
0.58
alot
0.57
doesnt
0.53
0.52
ital
0.51
reprint
0.48
ittens
0.48
Activations Density 0.849%