INDEX
Explanations
proper nouns
instances of the acronym "BO"
New Auto-Interp
Negative Logits
mary
-0.83
gew
-0.81
ional
-0.78
imental
-0.75
geist
-0.72
uality
-0.72
orial
-0.72
iment
-0.71
ivities
-0.71
enced
-0.71
POSITIVE LOGITS
BO
1.17
ARD
0.99
OTH
0.97
OTS
0.97
OSE
0.93
bably
0.92
OM
0.89
OT
0.89
ARDS
0.89
INC
0.89
Activations Density 0.005%