INDEX
Explanations
references to specific historical events or figures
New Auto-Interp
Negative Logits
Artifact
-0.15
immers
-0.15
irts
-0.14
çĿ£
-0.14
okable
-0.14
eventdata
-0.14
tempor
-0.14
irit
-0.13
org
-0.13
thalm
-0.13
POSITIVE LOGITS
treat
0.19
Treat
0.16
_epi
0.15
MSS
0.15
Periph
0.15
writing
0.15
Nest
0.14
chin
0.14
Transmission
0.14
Epic
0.14
Activations Density 0.113%