INDEX
Explanations
numbers or numeric references that have been mentioned in a textual context
the phrase "so far" indicating progress or cumulative events
New Auto-Interp
Negative Logits
ysis
-0.70
degree
-0.60
ounce
-0.59
ANT
-0.58
responsible
-0.57
moder
-0.56
exclusive
-0.56
unts
-0.54
thro
-0.54
Franç
-0.54
POSITIVE LOGITS
far
1.78
far
1.57
FAR
1.16
Far
1.07
Far
1.01
ared
0.95
bered
0.95
aring
0.93
arer
0.92
othes
0.89
Activations Density 0.059%