INDEX
Explanations
phrases indicating potential actions, decisions, or opinions
conditional statements or phrases indicating hypothetical scenarios
New Auto-Interp
Negative Logits
76561
-0.65
Shots
-0.62
Introduced
-0.60
senal
-0.59
OTA
-0.58
IDES
-0.58
Passenger
-0.58
iling
-0.56
HT
-0.56
View
-0.56
POSITIVE LOGITS
raining
0.95
beh
0.88
etsk
0.78
dawn
0.76
iner
0.76
unclear
0.73
ÃĥÃĤ
0.73
easier
0.71
ironic
0.70
impossible
0.68
Activations Density 0.218%