INDEX
Explanations
words associated with actions or events happening in the past
punctuation and specific characters in text
New Auto-Interp
Negative Logits
agar
-1.03
sag
-0.87
agin
-0.86
Sag
-0.85
actor
-0.76
Johnson
-0.75
657
-0.74
API
-0.74
bag
-0.73
federation
-0.73
POSITIVE LOGITS
Mon
1.79
Mon
1.73
MON
1.62
mon
1.43
MON
1.37
Monte
1.33
Tue
1.24
mon
1.18
Monarch
1.16
Tue
1.12
Activations Density 0.344%