INDEX
Explanations
references to locations and specific details in news articles
New Auto-Interp
Negative Logits
Rav
-0.59
cru
-0.57
ham
-0.57
shire
-0.56
itiz
-0.56
compan
-0.56
orship
-0.56
spr
-0.56
stru
-0.55
shaped
-0.55
POSITIVE LOGITS
+.
1.02
+)
0.92
+,
0.84
ABV
0.84
-+
0.83
/-
0.76
+
0.76
ablishment
0.74
âĨij
0.73
/+
0.72
Activations Density 0.286%