INDEX
Explanations
references to time in relation to an action or status
occurrences of the word "ow"
New Auto-Interp
Negative Logits
circulate
-0.67
ãĤ¶
-0.66
gratification
-0.66
apartheid
-0.66
ãĥ¼ãĥĨ
-0.64
Schwar
-0.64
ority
-0.62
pinch
-0.60
udes
-0.60
Rouge
-0.59
POSITIVE LOGITS
OW
1.17
ARDS
1.10
ERS
1.08
LAN
1.06
orld
1.04
OWS
1.02
DER
1.01
HEAD
1.00
IE
0.97
ING
0.95
Activations Density 0.006%