INDEX
Explanations
phrases indicating evidence or inference
phrases indicating speculation or conjecture
New Auto-Interp
Negative Logits
izons
-0.78
ategory
-0.75
]+
-0.69
itement
-0.68
irling
-0.68
ced
-0.68
iling
-0.67
Airl
-0.65
avorite
-0.65
orem
-0.65
POSITIVE LOGITS
probable
0.82
unclear
0.77
ãĤ¨
0.75
doubtful
0.75
imaru
0.72
unfair
0.71
BUS
0.70
abundantly
0.68
reasonable
0.68
Ī
0.67
Activations Density 0.082%