INDEX
Explanations
keywords or specific phrases related to notes or annotations
references to "notes" in various contexts
New Auto-Interp
Negative Logits
LOS
-0.80
nom
-0.68
slaught
-0.65
wyn
-0.65
amed
-0.65
SW
-0.60
homicide
-0.60
multif
-0.60
eve
-0.59
naissance
-0.59
POSITIVE LOGITS
notes
1.03
notes
1.01
Notes
0.99
books
0.90
Notes
0.77
note
0.75
afety
0.74
Gained
0.72
heet
0.72
.>>
0.71
Activations Density 0.010%