INDEX
Explanations
terms related to historical events or subjects
references to the concept of history
New Auto-Interp
Negative Logits
igans
-0.70
vae
-0.65
leased
-0.64
enger
-0.63
aton
-0.63
orc
-0.63
spir
-0.62
activate
-0.61
raised
-0.60
legate
-0.60
POSITIVE LOGITS
textbooks
1.23
buffs
1.19
books
1.12
lesson
1.07
documentaries
0.92
repeats
0.87
professor
0.86
textbook
0.86
repeating
0.84
lessons
0.84
Activations Density 0.069%