INDEX
Explanations
dates written in a specific format
references to events and entities, particularly names and dates
New Auto-Interp
Negative Logits
Ples
-0.84
Streets
-0.84
Turtles
-0.84
Tos
-0.82
Spo
-0.79
Spot
-0.77
pim
-0.77
Pom
-0.75
Ts
-0.74
Kis
-0.73
POSITIVE LOGITS
221
1.13
221
1.02
arn
1.00
220
0.99
211
0.98
21
0.98
223
0.97
jab
0.95
220
0.93
HER
0.92
Activations Density 0.550%