INDEX
Explanations
numerical data, such as dates, quantities, and measurements
the presence of end-of-text markers or punctuation indicating the end of a document
New Auto-Interp
Negative Logits
withd
-0.82
neighb
-0.78
predec
-0.70
destro
-0.69
behavi
-0.67
akespe
-0.66
administr
-0.64
infl
-0.64
undermin
-0.64
assum
-0.61
POSITIVE LOGITS
Died
0.77
Introduction
0.73
Killed
0.73
09
0.71
2018
0.71
1945
0.71
Learns
0.71
2018
0.71
Joined
0.69
06
0.69
Activations Density 0.155%