INDEX
Explanations
phrases indicating time or availability
New Auto-Interp
Negative Logits
yet
-0.20
yet
-0.19
crossings
-0.16
они
-0.15
heimer
-0.14
Lessons
-0.13
weakened
-0.13
lesh
-0.13
currently
-0.13
Likely
-0.13
POSITIVE LOGITS
only
0.23
Only
0.19
ONLY
0.19
ONLY
0.18
åıªèĥ½
0.17
only
0.17
Only
0.17
ÑĤолÑĮко
0.17
aeda
0.17
tember
0.16
Activations Density 0.051%