INDEX
Explanations
references to unexpected or surprising events
New Auto-Interp
Negative Logits
NavController
-0.71
Kat
-0.69
Kat
-0.65
fifths
-0.62
jerc
-0.62
orianCalendar
-0.61
IMDG
-0.60
fær
-0.60
I
-0.59
خاذ
-0.59
POSITIVE LOGITS
Surprise
1.23
surprise
1.11
surpris
1.05
Surprise
0.97
surprised
0.96
reaſon
0.93
surprise
0.92
surprises
0.89
pleaſure
0.86
itſelf
0.84
Activations Density 0.006%