INDEX
Explanations
phrases that indicate examination or observation of data or situations
New Auto-Interp
Negative Logits
ings
-0.20
/remove
-0.17
sv
-0.16
ucene
-0.16
/disable
-0.16
/write
-0.15
ıb
-0.15
ighth
-0.14
oad
-0.14
idal
-0.14
POSITIVE LOGITS
redient
0.23
redients
0.22
/testing
0.20
ly
0.19
gg
0.19
/loading
0.19
tour
0.17
wi
0.17
oneself
0.16
REDIENT
0.16
Activations Density 0.117%