INDEX
Explanations
phrases related to regulation and standards in various fields
New Auto-Interp
Negative Logits
Manip
-0.68
squared
-0.68
rob
-0.66
heit
-0.66
Bey
-0.65
Breaker
-0.65
Ramos
-0.64
Fernand
-0.64
owler
-0.62
Swed
-0.61
POSITIVE LOGITS
uit
1.25
uits
1.25
hip
1.13
ibilities
0.95
ury
0.93
ensical
0.93
ĸļ
0.92
chwitz
0.92
uries
0.88
rences
0.87
Activations Density 0.020%