INDEX
Explanations
specific references to certain objects or locations within longer passages of text
New Auto-Interp
Negative Logits
ornings
-0.78
wark
-0.75
ä¸ī
-0.74
HH
-0.71
swing
-0.71
Scotland
-0.69
ichick
-0.68
shine
-0.68
Mania
-0.67
ategory
-0.67
POSITIVE LOGITS
entire
1.58
remainder
1.43
entirety
1.36
contents
1.33
offending
1.30
remaining
1.23
same
1.23
slightest
1.21
requisite
1.21
whole
1.21
Activations Density 0.401%