INDEX
Explanations
references to specific time periods, such as centuries and years
references to time periods and societal contexts
New Auto-Interp
Negative Logits
inki
-0.72
cause
-0.70
kefeller
-0.63
ãĥĦ
-0.62
jong
-0.62
Dro
-0.62
soType
-0.61
appropriately
-0.60
him
-0.59
cheat
-0.59
POSITIVE LOGITS
there
1.00
,
0.91
adays
0.83
we
0.81
it
0.80
however
0.80
these
0.77
tens
0.76
nobody
0.75
they
0.73
Activations Density 0.281%