INDEX
Explanations
phrases related to argumentation and reasoning
New Auto-Interp
Negative Logits
acos
-0.15
ukt
-0.14
cker
-0.14
narr
-0.14
athon
-0.13
irl
-0.13
lov
-0.13
Kir
-0.13
Gazette
-0.13
arResult
-0.13
POSITIVE LOGITS
lider
0.18
uce
0.14
istrat
0.14
ibi
0.14
ULE
0.14
hc
0.14
ohana
0.13
656
0.13
ê³ł
0.13
ogh
0.13
Activations Density 0.241%