INDEX
Explanations
pronouns and words indicating specific relationships and references in context
New Auto-Interp
Negative Logits
wan
-0.16
ãĤīãģı
-0.15
ÅĤem
-0.15
IFA
-0.14
лиÑĨ
-0.13
andan
-0.13
lava
-0.13
INVAL
-0.13
енÑĤÑĥ
-0.13
onta
-0.13
POSITIVE LOGITS
understanding
0.47
understand
0.43
Understanding
0.41
understands
0.37
Understanding
0.37
Understand
0.36
understood
0.33
comprehension
0.32
çIJĨè§£
0.31
know
0.29
Activations Density 0.048%