INDEX
Explanations
phrases that indicate a large quantity or multiple options
New Auto-Interp
Negative Logits
most
-0.16
opic
-0.14
anything
-0.14
eines
-0.13
358
-0.13
ãĤĪãģı
-0.13
odings
-0.13
izz
-0.13
æľĢ
-0.13
ambda
-0.13
POSITIVE LOGITS
ways
0.42
reasons
0.33
Ways
0.31
reason
0.27
reason
0.26
places
0.25
ways
0.24
Reasons
0.24
Reason
0.23
occasions
0.21
Activations Density 0.112%