INDEX
Explanations
statements of ongoing actions or experiences
New Auto-Interp
Negative Logits
purpoſe
-0.95
Chriftian
-0.91
itſelf
-0.90
ſelf
-0.88
AndEndTag
-0.88
themſelves
-0.88
Reſ
-0.88
Jefus
-0.87
Efq
-0.87
myſelf
-0.86
POSITIVE LOGITS
as
0.49
kér
0.49
rotum
0.49
ve
0.48
ja
0.47
te
0.46
GenerationType
0.46
EconPapers
0.45
’
0.44
a
0.44
Activations Density 0.385%