INDEX
Explanations
terms related to critique or criticism
New Auto-Interp
Negative Logits
ha
-0.19
idity
-0.17
ience
-0.15
HA
-0.15
erule
-0.15
ths
-0.15
iyah
-0.15
lds
-0.14
esco
-0.14
hurst
-0.14
POSITIVE LOGITS
icism
0.29
ters
0.29
ically
0.26
ter
0.24
éri
0.21
ics
0.21
iques
0.18
icial
0.18
izens
0.18
elpers
0.18
Activations Density 0.007%