INDEX
Explanations
phrases that indicate physical discomfort or adaptation
New Auto-Interp
Negative Logits
Gerr
-0.15
strict
-0.15
rees
-0.14
andbox
-0.14
erra
-0.14
zza
-0.14
open
-0.14
μÏĢ
-0.14
loor
-0.14
open
-0.14
POSITIVE LOGITS
habit
0.25
ä¹ł
0.25
become
0.23
habit
0.23
Habit
0.23
пÑĢивÑĭ
0.23
Become
0.21
hab
0.21
hab
0.21
bec
0.21
Activations Density 0.214%