INDEX
Explanations
references to historical figures and events
historical figures and their relationships or lineage
New Auto-Interp
Negative Logits
icone
-0.64
safety
-0.63
bounce
-0.61
rollout
-0.59
Safety
-0.58
stakes
-0.57
healthy
-0.56
hassle
-0.56
DHS
-0.54
itivity
-0.54
POSITIVE LOGITS
philosopher
0.77
histor
0.75
Ottoman
0.75
scholar
0.72
Britann
0.71
theolog
0.70
Athen
0.68
historian
0.67
colonial
0.65
1968
0.65
Activations Density 3.321%