INDEX
Explanations
terms related to impact and scrutiny, particularly in contexts of danger and control
New Auto-Interp
Negative Logits
isation
-0.29
ization
-0.23
itation
-0.19
ication
-0.18
isations
-0.18
IZATION
-0.17
ation
-0.17
ATION
-0.17
iliation
-0.17
igation
-0.16
POSITIVE LOGITS
undry
0.15
aset
0.15
atest
0.15
esda
0.14
äl
0.14
atically
0.14
:invoke
0.14
ewhat
0.14
antly
0.14
nowled
0.14
Activations Density 0.116%