INDEX
Explanations
references to a specific word "Twe" in the text
mentions of a specific brand or product related to technology
New Auto-Interp
Negative Logits
ãĥĩ
-0.77
inated
-0.73
inating
-0.70
senal
-0.70
ozo
-0.67
iott
-0.66
inates
-0.64
ONT
-0.64
predatory
-0.63
UAL
-0.63
POSITIVE LOGITS
eden
1.14
Twe
0.99
ety
0.92
akens
0.88
edy
0.88
ollen
0.87
Twe
0.87
riter
0.85
ritten
0.85
ets
0.85
Activations Density 0.019%