INDEX
Explanations
phrases that express opinions or perspectives
New Auto-Interp
Negative Logits
Probe
-0.15
YK
-0.15
ptions
-0.14
ãĤ¢ãĥ¼
-0.14
Barcl
-0.14
à¥ģह
-0.13
ären
-0.13
porto
-0.13
ctors
-0.13
eba
-0.13
POSITIVE LOGITS
infer
0.16
IFF
0.15
meter
0.14
Pastor
0.14
icho
0.14
комÑĥ
0.14
ê¶Į
0.14
ùi
0.14
å¨ĺ
0.13
ÑĤÑİ
0.13
Activations Density 0.022%