INDEX
Explanations
mentions of the word "orn"
mentions of unicorns
New Auto-Interp
Negative Logits
dule
-0.76
Pwr
-0.73
perature
-0.71
refreshing
-0.67
Lowry
-0.61
MIA
-0.61
pton
-0.60
Quincy
-0.60
honestly
-0.60
arnaev
-0.60
POSITIVE LOGITS
orn
1.02
obyl
0.96
odon
0.95
alia
0.87
quist
0.78
ication
0.77
ail
0.76
usa
0.76
OGR
0.75
ography
0.75
Activations Density 0.008%