INDEX
Explanations
dates or specific time-related information
prepositions and terms related to time and place
New Auto-Interp
Negative Logits
uch
-0.68
river
-0.65
THIS
-0.63
yre
-0.63
ynes
-0.60
Souls
-0.59
irin
-0.59
imilar
-0.58
¯¯¯¯
-0.58
tin
-0.58
POSITIVE LOGITS
himself
0.72
ourselves
0.67
herself
0.66
itself
0.65
firsthand
0.65
themselves
0.65
its
0.63
myself
0.62
aloud
0.61
rer
0.60
Activations Density 0.488%