INDEX
Explanations
percentages or frequency-related expressions
New Auto-Interp
Negative Logits
opi
-0.16
Sanford
-0.15
lean
-0.15
pio
-0.15
ogo
-0.14
sterile
-0.14
546
-0.14
ngr
-0.14
inv
-0.14
radient
-0.14
POSITIVE LOGITS
heimer
0.17
ayet
0.16
azer
0.15
azal
0.15
worth
0.15
awl
0.14
ace
0.14
á»ĩn
0.14
vard
0.14
á»ĩu
0.14
Activations Density 0.001%