INDEX
Explanations
words related to ongoing actions or conditions that continue over time
New Auto-Interp
Negative Logits
umenthal
-0.72
GEAR
-0.71
RN
-0.67
ĪĴ
-0.66
ilde
-0.62
ramid
-0.62
ritical
-0.60
aptic
-0.59
robat
-0.58
vette
-0.58
POSITIVE LOGITS
ently
1.57
unchanged
1.10
ency
1.09
indefinitely
1.01
ively
0.98
entially
0.94
uously
0.91
encies
0.90
ences
0.88
ously
0.87
Activations Density 0.016%