INDEX
Explanations
time indicators or durations
instances of the word "since" used to indicate a duration of time
New Auto-Interp
Negative Logits
bart
-0.75
pta
-0.70
amina
-0.70
abus
-0.70
Fight
-0.69
NRS
-0.67
agy
-0.66
BILITIES
-0.63
fighter
-0.61
amount
-0.61
POSITIVE LOGITS
rely
1.13
ĸļ
1.04
inception
0.79
adolescence
0.74
1979
0.74
childhood
0.73
endorsing
0.72
1945
0.72
infancy
0.71
2006
0.71
Activations Density 0.049%