INDEX
Explanations
phrases or sentences indicating problems or issues
repeated mentions of the word "problem."
New Auto-Interp
Negative Logits
rib
-0.94
orks
-0.92
tein
-0.86
umber
-0.86
urses
-0.83
htaking
-0.81
vez
-0.80
collar
-0.80
rica
-0.78
cul
-0.78
POSITIVE LOGITS
plag
0.99
Problem
0.99
confronting
0.83
solved
0.81
solving
0.80
facing
0.77
naires
0.75
problems
0.74
Problem
0.73
problem
0.72
Activations Density 0.036%