INDEX
Explanations
names of people or places involved in news or political events
the presence of commas in a structured context
New Auto-Interp
Negative Logits
concess
-0.65
ipeg
-0.65
experience
-0.64
reth
-0.62
capacity
-0.61
overview
-0.61
prompt
-0.60
qual
-0.59
rid
-0.58
cog
-0.58
POSITIVE LOGITS
aka
1.07
Jr
0.91
Sr
0.90
namely
0.85
ModLoader
0.84
Jr
0.84
Magikarp
0.84
although
0.83
albeit
0.80
whose
0.79
Activations Density 0.445%